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There is growing evidence in the epidemiologic literature of the 
relationship between air pollution and adverse health outcomes. Pre¬ 
diction of individual air pollution exposure in the Environmental Pro¬ 
tection Agency (EPA) funded Multi-Ethnic Study of Atheroscelero- 
sis and Air Pollution (MESA Air) study relies on a flexible spatio- 
temporal prediction model that integrates land-use regression with 
kriging to account for spatial dependence in pollutant concentrations. 

Temporal variability is captured using temporal trends estimated via 
modified singular value decomposition and temporally varying spa¬ 
tial residuals. This model utilizes monitoring data from existing reg¬ 
ulatory networks and supplementary MESA Air monitoring data to 
predict concentrations for individual cohort members. 

In general, spatio-temporal models are limited in their efficacy 
for large data sets due to computational intractability. We develop 
reduced-rank versions of the MESA Air spatio-temporal model. To 
do so, we apply low-rank kriging to account for spatial variation in 
the mean process and discuss the limitations of this approach. As an 
alternative, we represent spatial variation using thin plate regression 
splines. We compare the performance of the outlined models using 
EPA and MESA Air monitoring data for predicting concentrations of 
oxides of nitrogen (NOa,)—a pollutant of primary interest in MESA 
Air—in the Los Angeles metropolitan area via cross-validated R^. 

Our findings suggest that use of reduced-rank models can improve 
computational efficiency in certain cases. Low-rank kriging and thin 
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plate regression splines were competitive across the formulations con¬ 
sidered, although TPRS appeared to be more robust in some settings. 

1. Introduction. There is growing evidence in the epidemiologic litera¬ 
ture of the relationship between air pollution and adverse health outcomes. 
Early findings were based on somewhat crude regional, and possibly tem¬ 
porally specific, assignment of exposures [Dockery et al. (1993), Pope et al. 
(2002), Samet et al. (2000)]. Yet, methods for assigning individual exposure 
to cohort study participants have become much more sophisticated. Recent 
studies have assigned individual exposure using the value measured at the 
nearest monitoring location [Miller et al. (2007), Ritz, Wilhelm and Zhao 
(2006)]; using “land use regression” estimates based on spatially distributed 
or Geographic Information Systems (GIS) based covariates [Brauer et al. 
(2003), Hoek et al. (2008), Jerrett et al. (2005a)]; and by interpolation with 
geostatistical methods such as kriging and semi-parametric smoothing [Jer¬ 
rett et al. (2005b), Kiinzli et al. (2005), Paciorek et al. (2009)]. 

Motivated by the Multi-Ethnic Study of Atheroscelerosis and Air Pol¬ 
lution (MESA Air) study [Kaufman et al. (2012)], Szpiro et al. (2010), 
Sampson et al. (2011) and Lindstrom et al. (2013) developed a flexible 
spatio-temporal prediction model based on monitoring data from existing 
regulatory networks as well as supplementary MESA Air monitoring data 
to predict concentrations for individual MESA cohort members. This work 
integrates land-use regression with kriging to account for spatial dependence 
in pollutant concentrations. Temporal variability is captured using tempo¬ 
ral trends estimated via sparse singular value decomposition and temporally 
varying spatial residuals [Puentes, Guttorp and Sampson (2006), Sampson 
et al. (2011), Szpiro et al. (2010)]. 

In general, spatio-temporal models are limited in their efficacy for large 
data sets due to computational intractability. For example, in the purely 
spatial setting, computation typically is of the order O(re^), where n is the 
number of spatial locations. The computational effort for log-likelihood eval¬ 
uation of the MESA Air spatio-temporal model typically grows at least as 
fast, but slower than 0{N^), where N is the total number of spatio-temporal 
observations [Lindstrom et al. (2013)]. Methods for reducing the computa¬ 
tional burden in spatio-temporal models are becoming more common in the 
spatial statistics literature. Several authors have proposed dynamic frame¬ 
works for modeling residual spatial and temporal dependence, although these 
approaches continue to suffer from computational intractability [Gelfand, 
Banerjee and Gamerman (2005), Stroud, Muller and Sanso (2001)]. In the 
large spatial data context, approximate likelihood and sampling-based ap¬ 
proaches have been proposed to reduce computational burden [Fuentes (2007), 
Pace and LeSage (2009)]. An alternative to approximate methods involves 
reducing the spatial process to a A-dimensional subspace {K -C n) in order 
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Fig. 1. Map of AQS and MESA Air monitoring locations in Los Angeles, California. 
“Home outdoor” monitors have been jittered for participant confidentiality. 

to increase computational efficiency [Banerjee et al. (2008), Crainiceanu, 
Diggle and Rowlingson (2008), Kammann and Wand (2003), Nychka and 
Saltzman (1998), Stein (2007, 2008)]. These so-called “low-rank” or “reduced- 
rank” approaches can reduce computation to 0{K^). 

In the current work, we develop reduced-rank versions of the spatio- 
temporal model outlined in Lindstrom et al. (2013), Szpiro et al. (2010). 
Specifically, we apply the approach proposed by Kammann and Wand (2003) 
to achieve low-rank kriging to account for spatial variation in the mean pro¬ 
cess and spatially varying temporal trends. We discuss the limitations of 
this approach and, as an alternative, represent spatial variation using thin 
plate regression splines [Wood (2003)]. We compare the performance of the 
outlined models using Environmental Protection Agency (EPA) and MESA 
Air monitoring data for predicting oxides of nitrogen (NO^,) concentrations 
in the Los Angeles metropolitan area. 

2. Description of data. 

2.1. Air Quality System (AQS). The national AQS network of regula¬ 
tory monitors, managed by the EPA, reports concentrations of a wide variety 
of air pollutant concentrations on an ongoing basis, most typically hourly 
averages. Eor this study, we include NO^ measurements from 21 AQS mon¬ 
itors in the Los Angeles area, one of six metropolitan areas where MESA 
Air cohort members live. Monitor locations are shown in Eigure 1 (left). As 
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MESA Air supplementary monitoring is done at the 2-week average scale, 
we aggregate AQS monitoring data to 2-week averages. Due to skew in the 
data, all 2-week averages are log transformed. 

2.2. MESA Air. As part of the MESA Air project goals to provide high 
quality individual exposure prediction, additional monitoring data were col¬ 
lected in each of the study’s six geographic regions, including Los Angeles. 
The goal of the supplementary monitoring was to provide geographically 
complementary data to the AQS monitoring data and to systematically span 
the design space based on proximity to traffic. Additionally, supplementary 
monitoring data included measurements collected at a subset of cohort par¬ 
ticipant homes. The sampling strategy is described in more detail by Cohen 
et al. (2009). 

The MESA Air supplementary data is comprised of three classes of mon¬ 
itors, which we refer to as “fixed site,” “home outdoor” and “community 
snapshot.” There are a total of five “fixed sites” included in this study in 
the Los Angeles area. These “fixed-sites” began measuring 2-week average 
concentrations in November of 2005, for a total of 426 measurements by 
June 1, 2009. A total of 84 “home outdoor” locations were included in this 
study. These sites were sampled during 2-week periods starting in May of 
2006 and ending in February of 2008, for a total of 155 measurements. The 
sampling plan calls on each home to be measured two times during different 
seasons. Last, the “community snapshot” sub-campaign consists of 177 sites 
measured in three rounds of spatially rich sampling during single 2-week pe¬ 
riods from July 5, 2006 to January 1, 2007, for a total of 449 measurements. 
In each round of the “community snapshot” monitoring, most monitors were 
clustered in groups of six, with three on each side of a major roadway at 
distances of about 50, 100 and 300 meters, and locations were chosen to span 
the domain of various land-use categories and to cover a wide geographic 
region. All MESA Air monitoring locations as of June 1, 2009 are displayed 
in Figure 1. Likewise, temporal coverage and sampling frequency during the 
study period for each monitoring location and type is depicted in Figure 2. 
Table 1 provides summary statistics on the native and log-scales for both 
EPA and MESA Air data. 

2.3. GIS. In addition to the monitoring data, spatial prediction at loca¬ 
tions where there are no measurements rely heavily on GIS-based covariates 
and so-called “land-use regression” techniques [Jerrett et al. (2005a)]. In this 
paper, we considered a limited set of geographic covariates: (i) log distance 
to Al, A2 or A3 roadway [TeleAtlas (2000)], (ii) log Caline3QHCR point pre¬ 
dictions averaged over 9 kilometer buffer [Eckhoff and Braverman (1995)], 
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Fig. 2. Schematic of sampling schedule for AQS and MESA Air monitors between 1999 
and 2012. Eaeh point represents a two-week sampling period. 

(iii) distance to nearest coast [TeleAtlas (2000)], (iv) distance to city hall 
[TeleAtlas (2000)], (v) normalized difference vegetation index averaged over 
250 meter buffer [Carroll et al. (2004)], (vi) log elevation, and (vii) percent 
impervious surface in 50 meter buffer [Fry et al. (2011)]. 


Table 1 

Summary of statistics of NOx monitoring data at EPA AQS and MESA Air 
supplementary monitoring sites 



NO. 

ppb 

log (NO. 

ppb) 

Type of site 

Mean 

SD 

Mean 

SD 

AQS/fixed site 

2-wk 

53.30 

40.10 

3.72 

0.75 

LTA 

45.35 

17.27 

3.74 

0.39 

Community snapshot 

2006-07-05 (summer) 

34.24 

11.49 

3.47 

0.39 

2006-10-25 (fall) 

75.09 

23.47 

4.27 

0.32 

2007-01-31 (winter) 

95.29 

26.99 

4.51 

0.30 

Home outdoor 

45.65 

28.30 

3.63 

0.64 
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3. Methods. 

3.1. Review of full-rank spatio-temporal model. The existing spatio- 
temporal model as initially described by Szpiro [Szpiro et al. (2010)] takes 
the form 


y[s,t) = p.[s,f) + v[s,t), 

where y{s,t) is the log two-week average of pollutant measurements at lo¬ 
cation s and time t, p,{s,t) is the mean field and y{s,t) is the residual field. 
The mean field, y, is defined as a linear combination of temporal basis func¬ 
tions with spatially varying coefficients. The spatially varying coefficients 
are comprised of a land-use regression component in addition to spatially 
structured random fields. These coefficients capture spatial heterogeneity in 
the amplitude of the temporal basis functions. As such, the mean field is 
written as 

m 

lJ‘{s,t) = +/3j(s) -LV’j(s)}/i(i), 

i=i 

where the Xj are design matrices containing GIS/land-use covariates of 
dimension n x (pj -\- 1), where n is the total number of observed sites and 
OLj is a vector of regression land-use regression coefficients of dimension 
Pj -|- 1 X 1. The /3j(s) where s = (si,...,s„) are Gaussian spatial random 
helds distributed as 


Here, T,p.(6j) is the covariance matrix of dimension nxn indexed by the 
vector of parameters 6j. Generally, we assume a spatial exponential decay 
model with range (fj and partial sill r?. The ^j(s) are i.i.d. random effects 
distributed as 

V^,(s)~iV(0,a|l). 

Note il^j{s) can equivalently be thought of as the nugget for the /?j(s)-held. 
The original formulation of this model did not include a provision for a 
nugget [Szpiro et al. (2010)], although more recent work allowed for but 
did not utilize this parameter [Lindstrom et al. (2013)]. We later discuss 
the implications of excluding the nugget for computation and predictive 
performance. 

The fj{t) are temporal basis functions with fi{t) = 1 for all t (typically 
m is small, <3) estimated by modified singular value decomposition. See 
Fuentes, Guttorp and Sampson (2006), Szpiro et al. (2010), Sampson et al. 
(2011) for a more thorough discussion of trend estimation. Figure 3 depicts 
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Smoothed Temporal Trend Functions 



Fitted Trend at Site LC001 



Date 


Fig. 3. (Top) Two temporal basis functions estimated by modified singular value decom¬ 
position from Los Angeles monitoring data; (middle and bottom) raw log-transformed data 
and fits to the two temporal basis functions at sites near (LCOOl) and far (06037002) from 
the coastline. 

these smooth temporal basis functions and their fit to the EPA and MESA 
Air NO 3 ; monitoring data at two sites. 

Last, we specify the model for the residual field, Consistent with 

Lindstrom et al. (2013), Szpiro et al. (2010), Sampson et al. (2011), we as¬ 
sume that the mean model accounts for the mean structure and all temporal 
correlation. Thus, the spatio-temporal residuals are assumed to have zero 
mean and to be independent in time, so that 

Ks,t)~iV(O,St(0.)), 

where S* ( 0 j,) is a covariance matrix of dimension nt x rit and rit is the 
number of sites observed at time t with = N, the total number of 

observations. Once again, we assume that the i' field follows a spatial ex- 
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ponential decay model with range (f), partial sill and (possibly) nugget 
A concise representation of this model is given as 


( 1 ) 


Y = FXq + FB + FP + V, 


where Y is an Y x 1 vector of stacked responses y{s,t) (first varying s then 
t), F = {fst,is') is an Y X mn matrix that has elements 




st,is' 


fi{t), ifs = s', 

0, else, 


X is a block diagonal matrix with diagonal blocks {Xj}^^, a is an 
J2JLi{Pj + 1} X 1 stacked vector of the cxj, B is an mn x 1 vector of the 
stacked I3j, P is an mn x 1 vector of the stacked nuggets, ipj, and V is 
an Y X 1 vector of the stacked u (first varying s then t). This model is 
thus indexed by the land use regression coefficients, q, and the covariance 
parameters 

6»b = (6»i,...,0m), ej = {4ij,Tf), j = l,...,m, 


0P = (o-?,...,cr^), 
Ov = (0i.,r^,cj^). 


To simplify notation, we collect the covariance parameters into the vector 
3 = {6b, dp, 6v)- In the remainder of the manuscript, for the sake of brevity 
we suppress the dependence of covariance matrices on their respective pa¬ 
rameters, except where an explicit dependence is illustrative. 

Model (1) is typically fit using profile maximum likelihood methods, 
although full maximum likelihood and restricted maximum likelihood ap¬ 
proaches are also possible [Lindstrom et al. (2013)]. Sampson used a multi¬ 
stage “pragmatic” approach to fitting (1) and generating predictions [Samp¬ 
son et al. (2011)]. Lindstrom adapted the model to allow for time-varying 
covariates, although this extension is not presented here [Lindstrom et al. 
(2013)]. This model is implemented in the R-package, SpatioTemporal, 
available at http://cran.r-project. org/package=SpatioTemporal. 


3.2. Motivation for reduced-rank spatial smoothing. Although the above 
formulation of the model has been successful for predicting air pollution 
concentrations, we note two limitations of this formulation, particularly with 
respect to the /3-fields. First, we note that it is not natural to interpret the 
/3-fields as random effects since it is difficult to imagine the data generating 
mechanism that might give rise to such fields [Hodges and Clayton (2011), 
Hodges (2013)]. Second, the range parameters in the /3-fields tend to be 
challenging to estimate in practice. Moreover, Zhang showed that in the case 
of spatial generalized linear mixed models, this quantity is not consistently 
estimable [Zhang (2004)]. 
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As such, we consider a spline-based representation of the /3-fields in the 
mean model. To motivate, we note that the Gaussian spatial /3-helds, as 
dehned above, can be represented as spatial splines as follows. Let 



where Li is a matrix such that 

n = {C(||si-s,||)},_^.g5 

and <S is the set of observed spatial locations. For the exponential model, 
C{r) = exp{—|r|/(^}. It follows that the /3-fields can be expressed as 

where Sj ~MVN(0,rJl). The n columns of the matrix 17^/^ represent n 
spatial basis functions indexed by the parameter <pj. Written as such, the 
/3-helds can be viewed as random linear combinations of spatial basis func¬ 
tions. Exploiting the connection between linear mixed models and penalized 
splines, we can view the /3j-helds as penalized spatial splines with smooth¬ 
ing parameters [Ruppert, Wand and Carroll (2003)]. Having repre¬ 

sented the /3-helds as penalized splines, it is natural to consider penalized 
reduced-rank splines instead as a means of improving model performance 
and computational efficiency. 

We note that an analogous argument can be made for the residual held. 
However, it is also the case that the z^-held is well understood within the 
traditional framework of random effects models. That is, the i/-held captures 
extra random spatial variation that arises from time point to time point. 
Furthermore, the range parameter in the zz-held tends to be more stably 
estimated in practice due to the repeated measurements over time. 

In the following sections, we describe reduced-rank representations of the 
/3-helds using low-rank kriging and thin plate regression splines. 

3.3. Low-rank kriging. We follow the approach outlined by Kammann 
and Wand (2003) and Ruppert, Wand and Carroll (2003) for low-rank krig¬ 
ing (LRK) of the /3-helds. Specihcally, LRK is achieved by replacing fl with 
Zf7~^Z~'', where 

Z = {ddlSi — ^ill)}iG5je/C) ^ 

and JC is the set of spatial knot locations, k, of cardinality iL <C n. It follows 
that we can approximate /3j(s) by where Sj is now a iL-vector 

distributed as MVN(0, =rjl). We note that this approach bears strong 
resemblance to the predictive processes presented by Banerjee [Banerjee 
et al. (2008)]. In fact, Banerjee noted that LRK is a re-projection of his 
predictive process. As such, these approaches are computationally identical 
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despite the fact that the predictive process is derived formally from a full- 
rank parent process. 

Letting and B be the stacked vector of SjS, we can 

express the spatio-temporal model as 

(2) y = FXq:-LFZbB-LFP-L V. 

Model (2) can be re-expressed as 

Y~MVN(FXq;,S), 

where 

S = FZbS + FSpF^ + Sy, 

is a block-diagonal matrix with diagonal elements Lip is a 

block-diagonal matrix with diagonal elements and Ly is a block- 

diagonal matrix with diagonal elements log-likelihood is given 

by 

l{a, H I Y) oc - log |L| - (Y - FXq)^S-1(Y - FXq). 

Consistent with Szpiro et al. (2010), Lindstrom et al. (2013), we estimate 
regression coefficients a using the profile maximum likelihood. It is easy to 
show that 

a = (X^F’^S"^FX)”^X’^F’^S"^Y, 
so that the profile log likelihood is simplified to 

(3) Zp(H I Y) oc - log |S| - (Y - FXq)^S-^(Y - FXq), 

and the remaining parameters are estimated as those quantities that max¬ 
imize (3). We estimate all parameters using the L-BFGS-B algorithm as 
implemented in the optim function in stats package in R. This is an iter¬ 
ative method that allows for box constraints on all parameters [Byrd et al. 
(1995)]. 

Prediction is achieved by assuming a joint distribution between observed 
data Y and unobserved data Y*, 



where L** is the covariance of Y* and S.* is the cross covariance of Y 
and Y*. Predictions are based on the conditional expectation F'[Y*|Y] with 
MLEs plugged in, namely, 

Y* = F*X*a -L S*.L"^(Y - FXa) 
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with conditional prediction variance 

A drawback of LRK is the dependence of the basis functions on the range 
parameters j = 1,... ,m. Kammann and Wand, for purely spatial data, 
address this issue by hxing the value of this parameter at the maximum spa¬ 
tial distance observed in the data [Kammann and Wand (2003)]. Although it 
is attractive to condition on fixed spatial basis functions, arbitrary selection 
of these parameters could lead to worse predictive performance. The range 
parameters can be estimated from the data, albeit at the expense of more 
challenging numerical optimization and with the caveat that they may not 
be consistently estimable [Zhang (2004)]. 

An alternative approach which sidesteps these issues and leverages the 
spatial spline formulation calls for the use of alternative spline bases. Thin 
plate regression splines are a popular alternative, and we explore their ap¬ 
plication in the current problem below. 


3.4. Summary of thin plate regression splines. Thin plate regression 
splines (TPRS) present an alternative to the LRK approach and mitigate the 
issue of estimating the range parameter(s) [Wood (2003)]. Although these 
models are widely used (implementation is available in the R package mgcv, 
e.g.), we briefly summarize the approach with the goal of describing parallels 
between TPRS and LRK. 

Assume that we wish to estimate the function / based on (purely spatial) 
observations Y at locations s = (si,S 2 ) such that 

y^i = f{si) +£i 


by minimizing this penalized objective function 


|y-/(s)1| + a 



SiJS2 




dsj 


dsl 


dsi ds2 


It can be shown that the solution is given by 


2 

dsi ds2. 


( 4 ) 


f{s) = '^CiVi\\s- 

i=l 


3 

i=i 


where the Lj are linearly independent polynomials spanning the space of 
polynomials in TZ^ (of degree less than 2) and r/(r) = 2“^7r“^r^ log(r). Fur¬ 
ther, ^ and 7 are hxed unknown coefficients subject to the constraint = 
0 with Tij = ij{si) [Green and Silverman (1994)]. 
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Let E be a matrix so that Eij = 7 /(||si — Sj||). Wood presents a reduced- 
rank approximation of this problem, which is the solution to the uncon¬ 
strained optimization problem 

minimize||Y - - T 7 II + AC^W^Di^Wi^C, 

where is a iL — 3 x 1 vector of fixed unknown coefficients, UDU^ is 
the eigendecomposition of E so that the n columns of U are equal to the 
eigenvectors of E ordered by their associated eigenvalues from largest to 
smallest, D is a diagonal matrix of these eigenvalues, \Jk is a matrix of 
the first K columns of U, and Y)k is a matrix of the first K rows and 
columns of D. Last, ^k is a K x K — 3 orthogonal column basis such that 
= 0 (to account for the constraint) [Wood (2003)]. 

It is easy to see that this unconstrained optimization is equivalent to 
fitting the linear mixed model 

Y = T7 + U^DxWxC + £, 

where C* ~ MVN(0, cj^(W^Di^Wx)“^), e ~ MVN( 0 , u^I), and A = /cr^. 
Equivalently, let C* = where S* ~ MVN(0, cr|l), then 

the above equation becomes the following: 

Y = T 7 + + £. 

3.5. Formulation of P-fields as thin plate regression splines. We consider 
modeling the /3-fields as TPRS using the relationship between penalized 
splines and mixed models [Ruppert, Wand and Carroll (2003)]. Following the 
above formulation, we can approximate /3j(s) in ( 1 ) as -|- Z*Sj, where 
T contains the spatial coordinates of the monitoring locations, 7 ^ is a 2 x 1 
vector of fixed unknown coefficients, Z* = (W^Di^Wi^)”^/^, 

and d* is a RT — 3 X 1 vector distributed as MVN(0,t?I). 

We can succinctly incorporate this approximation into our modeling frame¬ 
work as follows. First, augment the design matrices Xj by appending the 
matrix T so that X* = (X^ T) for j = 1,... ,m (if Xj already contains the 
spatial coordinates as predictors, then this step is unnecessary). Addition¬ 
ally, append the vector 7 ^ to the aj so that cx*^ = {a.J 7 J) for j = 1 ,..., m. 
Last, letting be a block-diagonal matrix with diagonal elements {Z*}JLj^ 
and a* and B* be the stacked vectors of cx* and d* for j = 1,... ,m, re¬ 
spectively, we formulate the TPRS version of the spatio-temporal model as 
a linear mixed model, as follows: 

(5) Y = FX*q;*-L FZ|jB*-L FP-L V. 

We note the similarities between equations (2) and (5). In fact, Nychka 
showed that thin plate splines are equivalent to kriging using a generalized 
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covariance function [Nychka (2000)]. It is clear that the difference between 
LRK and TPRS has to do primarily with the choice of basis functions. 
However, we also emphasize that the TPRS bases are not dependent on any 
additional (e.g., range) parameters. Estimation of model parameters and 
prediction follows as described in Section 3.3. 

4. Computational considerations. Evaluation of (3) directly is compu¬ 
tationally intensive, with the number of computations growing as 0{N^). 
However, the computational burden can be eased considerably by taking ad¬ 
vantage of the block-diagonal nature of the T,b and Sy. Namely, Lindstrom 
showed that reformulation of (3) can reduce the computational burden to 
0{m^n^) [Lindstrom et al. (2013)]. Typically, low-rank models boast a com¬ 
putational advantage over their full-rank counterparts. Yet, reducing the 
computational burden in spatio-temporal data is nuanced. In the following, 
we discuss how the formulation of the /3-fields using either LRK or TPRS 
impacts computation. We illustrate the computational burden of calculating 
(3) by considering the determinant term jSj, employing a similar reformula¬ 
tion to that employed in Lindstrom et al. (2013) to exploit the block diagonal 
nature of Eg and Sy. Proofs of the following results and the correspond¬ 
ing reformulation of the full likelihood in (3) are provided in the Online 
Supplement [Olives et al. (2014)]. 

By application of known identities, it can be shown that 

jSj = jFZsEgZjF’^ + FEpF^ + Ey] 

(6) =lE^|lEpl|Ey|lEpi + F^E^iFl 

X jE^i + - E(;'F(fTEp'F + E^')-iF^E^i)FZpl. 

For highly unbalanced data like that which we typically encounter in MESA 
Air, (6) is dominated by the calculation of jEp^ -|-F~''Ey^Fj. Computation 
of this component grows at 0{nrfirfi), the same rate as the full-rank model. 

As mentioned, the full-rank spatio-temporal model originally published by 
Szpiro did not include the nugget, P, in the /3-fields [Szpiro et al. (2010)]. 
When the nugget is not present, the determinant jEj reduces to 

lS| = lFZBEpZ^)F^ + Eyl 

(7) 

= lEpl|Ey|lETi + Z^)FTE-iFZs|. 

Interestingly, in (7), computation will generally be dominated by calculation 
of -|-ZpF'''Ey^FZp], which grows at 0{m^K^). This makes it clear 
that, when the nugget is not present, reducing the rank of the /3-helds can 
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lead to some improvement in terms of computation. We note that in the case 
where the data are more balanced, it is possible that computation of |Sy | (or, 
equivalently, which grows at will dominate computation in 

both cases. 

In Figure 4, we plot the CPU time required for optimized log-likelihood 
evaluation in full-rank and reduced-rank models with iP = 25 with and with¬ 
out the nugget present for both LRK and TPRS. We see that for LRK and 
TPRS, as the number of sites increases, full-rank models take large steps in 
computation time required, whereas reduced-rank models grow much more 
slowly when a nugget is not present. However, there is very little difference in 
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Fig. 4. CPU time required for a single log-likelihood evaluation of LRK and TPRS mod¬ 
els for the ERA AQS and MESA Air NOx monitoring data in Los Angeles, California. 
Triangles indicate models where the rank of the spatial smooth is equal to the number of 
sites and circles indicate models where the rank of the smooth is equal to 25 in various 
depleted MESA Air data sets. 
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computational growth between full- and reduced-rank models as the number 
of sites increases when the nugget is present. 

5. Application to NOg, monitoring data in Los Angeles. We apply the 
proposed reduced-rank spatio-temporal models to NO^ data collected in the 
Los Angeles area as part of the MESA Air monitoring campaign and via the 
EPA regulatory network. 

5.1. Models considered. We fit a variety of models to the data which vary 
in three aspects: (1) the choice of spline basis, (2) the rank of /3-field smooth, 
and (3) the inclusion of the nugget. In all models considered, we employ two 
time trends (m = 2) as depicted in Eigure 3. Likewise, the residual z/-held is 
always specified as exponentially distributed with a nugget. And, last, all of 
the GIS covariates are present in each of the Xj matrices. 

5.1.1. Choice of spline basis. We have outlined two possible classes of 
spline bases, exponential (used in LRK) and thin plate splines. As previ¬ 
ously indicated, the use of exponential basis functions requires handling of 
the range parameters in each of the /3-fields by either hxing its value at some 
ad hoc data-derived value or through full optimization. To investigate the 
trade-off between optimization of an additional range parameter and fixing 
this parameter at an arbitrary conservative value, we assume the range pa¬ 
rameters in the /3-helds are both hxed, and in separate models that they 
are estimated. To assess the sensitivity to the fixed value, we set the range 
parameters in all fields equal to the maximum, one half, one quarter and 
one eighth of the observed maximum spatial range in the data (80.7 km). 
Additionally, to assess the sensitivity of model performance to the choice of 
spline basis, we fit TPRS smooths to the /3-fields. 

5.1.2. Rank of smooth. As a general rule of thumb, Ruppert, Wand and 
Carroll suggest that the number of knots, K, be chosen as max(20,min{150, 
n/4}) [Ruppert, Wand and Carroll (2003)]. In the case of our MESA Air 
and EPA data, this would result in iL = 71. Although this rule of thumb is 
convenient, it is unclear how the number of knots in the spatial component of 
the mean model will influence spatio-temporal prediction. Eor our purposes, 
we explore a variety of different ranks on spatio-temporal prediction, K = 
287,100,50 and 25. We note that the models with K = 287 correspond to 
full-rank models. 

Knot location can also play an important role in LRK. Kammann and 
Wand choose knot locations using efficient space-filling algorithms [as im¬ 
plemented by the cover. design() function in the R package fields] [Kam¬ 
mann and Wand (2003)]. In our primary investigations, we choose knot lo¬ 
cations using space-filling of monitoring sites within the study area (see Eig¬ 
ure 5). Although space-filling of observed locations is a convenient approach 
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Fig. 5. Map of knot locations chosen by efficient spaee-fiUing of monitoring locations 
(small open circle) and of regular grid locations (large open circles). Small black dots rep¬ 
resent participant locations and crosses represent grid locations. “Home outdoor” monitors 
have been jittered for participant confidentiality. 

to choosing the knot locations in our analysis, it is natural to consider knots 
chosen at alternative locations. For example, an attractive option could be 
to specify knot locations on a regular grid over the study area. To inves¬ 
tigate, in addition to the primary analysis, we also fit models where knot 
locations are chosen using space-filling of a regular grid of the convex hull 
of the study region, where each grid cell is approximately 2.5 kilometers on 
each side (see Figure 5). 

5.1.3. Nugget effect. Given the analytical findings suggesting that 
reduced-rank modeling leads to a computational advantage in the case when 
the nugget is not present, we fit models both with and without the nugget. 
However, we note that while analytically feasible, models which exclude the 
nugget from the /3-fields are less conceptually defensible. Namely, exclusion 
of the nugget from the /3-fields makes it difficult for the model to capture 
fine-scale variability in the mean process. Moreover, preliminary investiga¬ 
tions showed that very low-rank smooths in models without a nugget in 
the /3-fields were unstable. As such, we present a limited set of results for 
reduced-rank models where the nugget is not present. 

5.2. Model validation. We employ cross-validation to assess model pre¬ 
dictive performance. Our primary interest is in prediction of long-term aver¬ 
ages of NOx concentrations. Unfortunately, in this data set there are only 26 
AQS and/or MESA “fixed sites” that provide adequately long time-series for 
long-term average validation. These sites tend to be more homogeneous in 
their geographic covariate distribution and have larger spatial spread when 
compared to MESA participant locations, which could potentially limit our 
ability to adequately assess predictive performance. 

As such, in addition to cross-validation of AQS and MESA “fixed sites,” 
we also consider cross-validation of MESA “community snapshot” and “home 
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outdoor” sites. We apply tenfold cross-validation to each type of monitor. In 
each of these three scenarios, all remaining data are used to estimate model 
parameters and to predict at left-out locations. 

Due to the varying nature of sampling at sites in each of the three sce¬ 
narios, Lindstrom suggests calculating RMSE and B? slightly differently 
in each case [Lindstrom et al. (2013)]. At “fixed”/AQS sites, we calculate 
RMSE and B? metrics on both the 2-week and long-term average scales. 
Long-term averages at left-out sites are computed only over times where 
data are observed, so that 

exp{y(s,r)} 

T-.3y{s,T) 


The cross-validated B? on the long-term average scale is given by [Szpiro, 
Sheppard and Lumley (2011)] 


( 8 ) 


B^ = min< 0,1 — 


RMSE(c(s))^ ) 
Var(c(s)) J’ 


Eor the second scenario, we perform cross-validation of the “community 
snapshot” locations. We cross-validate all three sampling periods/seasons 
simultaneously and calculate cross-validated RMSE and B? by season. Do¬ 
ing so allows us to assess the spatial predictive ability of the model across 
multiple seasons. Likewise, as each of the “community snapshot” locations 
were sampled during the same two-week periods, we can view the result¬ 
ing metrics as representative of the pure spatial predictive capacity of the 
model. 

Last, we also consider cross-validation of “home outdoor” sites. As the 
“home outdoor” sites are repeatedly sampled over time and typically at 
different time points, much of the B? is likely to reflect a temporal signal, 
which is strong in these data. As such, in addition to the raw cross-validated 
B?, we also consider a de-trended version of the B? where the variance, 
Var(c(s)), in (8) is replaced by the variance of observations after removing 
the predictions from a reference model that accounts for (some) temporal 
variability. Here, we use a reference model based on the spatial average of 
measurements at AQS/“fixed sites” at each time point. Thus, the de-trended 
B? represents the improvement in performance of our models compared to 
central site predictions commonly used in air pollution epidemiology studies 
[Pope et al. (1995)]. 


5.3. Comparison with other reduced-rank spatio-temporal models. Al¬ 
though the current model was developed specifically to address the com¬ 
plexities arising in the context of MESA Air, a number of other methods for 
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reduced-rank spatial and spatio-temporal modeling have been published, in¬ 
cluding fixed-rank filtering, Gaussian Markov random field approximations, 
covariance tapering, predictive processes and generalized additive models. 
Unfortunately, hxed-rank filtering is not available in an off-the-shelf pack¬ 
age, and implementing this model for these data is a project unto itself. We 
further note that the application of Gaussian Markov random field approx¬ 
imations and covariance tapering in this setting is nuanced and may not 
result in any computational savings for these data. See the Online Supple¬ 
ment [Olives et al. (2014)] for further discussion of the application of these 
two approaches in the current modeling framework. 

As mentioned previously, there appears to be an explicit correspondence 
between predictive processes and LRK, as noted in Banerjee et al. (2008). 
As such, formally modeling the /3-fields in ( 1 ) as reduced-rank predictive 
processes would not provide any additional insight into this work. That 
being said, one version of a predictive process spatio-temporal model is im¬ 
plemented in the spBayes package in R. Namely, the function spDynLM fits 
the following model: 

y{s,t) = Xi(s)/34 -Lut(s) -t-et(s), t = l, 2 ,...,r, 
et(s) ~iV(0,T|), 

/3t = I5t-i + T]t, rjt ~ A^(0, S^), 

/3o ~iV(mo,So), 

ut{s) =ut-i{s) +wt{s), wt{s) ~ GP(0,C't(-,6»t)), 

uo(s) = 0. 

The spatial process wt, here assumed to be exponential, can be replaced 
with a predictive process of reduced rank to reduce computational burden. 
This model signihcantly deviates from our own and may not perform well in 
the context of such highly imbalanced data as that which we analyze here. 
Nevertheless, we apply it to our data in an effort to make a fair comparison 
between published approaches to reduced-rank spatio-temporal modeling 
and our method. Specifically, we fit two models: 

1. full-rank model {K = 287) for all wt fields, and 

2. reduced-rank (K = 50) for all wt with knots chosen on a grid. 

We note that the spDynLM function requires that knots be chosen on a grid 
when utilizing the reduced-rank predictive process machinery. In both cases, 
we fit the models assuming the following priors for the 

ll^t Unif(l/(0.9 X max distance),3/(0.05 x max distance)), 

O'/ ~ InvGamma(2,10), 
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~ InvGamma(2,5), 

~ InvWish(2,0.001Ip). 

These priors are largely based on the example code available in the spDynLM 
documentation, with some small changes to reflect the data. Model pre¬ 
dictions were the median of 500 posterior draws, after a burn-in period of 
1500. We cross-validated these models for “fixed sites” using the same cross- 
validation groups as before. 

Last, for an additional comparison with methods available in off-the-shelf 
software, we considered a generalized additive model that reformulates the 
mean process without resorting to a dynamic model. Namely, we 

replaced n{s,t) with the following: 

X{s)a + r]t +g{s)+ h{s,t). 

Here both g and h are modeled using TPRS. For investigating models with 
spatial rank of K, we set the degrees of freedom for g equal to K and the 
degrees of freedom for h equal to iC x 14 (e.g., when K = 50, h has 700 df), 
where 14 is the number of years represented in the data. Note that both g 
and h can be viewed as penalized regression splines with structure similar to 
what we outline in the paper. But for h, we are now assuming a nonseparable 
model for space and time which differs from the tensor product approach 
used in our model. Moreover, we do not rely on predefined temporal basis 
functions to model time. The rjt are i.i.d. Gaussian random effects that 
capture nonsmooth temporal variation. Note that the zz-field remains the 
same as outlined in the paper. We fit this model using the gamm function in 
the mgcv package in R. 

6. Results. 

6.1. Performance of proposed reduced-rank models in LA. Table 2 shows 
the results of the cross-validation at “fixed sites” for models when the nugget 
is present. For LRK models, the choice of range does not appear to be 
a strong determinant of the predictive performance, with fully optimized 
models performing nearly as well as those models with the range parameter 
fixed at various values. Likewise, TPRS models exhibit highly competitive 
predictive performance with a slight edge over LRK models at lower ranks 
for long-term averages. Gross-validated Pf values stay relatively consistent 
across ranks until K = 25, at which point both 2-week and long-term average 
predictive scores drop off. In all cases, models with some spatial smoothing 
{K > 0) perform better than models without any smoothing {K = 0). 

Table 3 show the results of cross-validation at “community snapshot” 
sites. We typically see the best performance in the Winter as compared with 
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Table 2 

Cross-validated RMSE and for “fixed sites” when nugget is present. E? have been 
multiplied by 100 for presentation 


RMSE 


Basis / K 


287 

100 

50 

25 

0 

287 

100 

50 

25 

0 

2-wk 

LRK ((/) = 

est) 

15.43 

15.52 

15.72 

16.35 

17.82 

85 

85 

85 

83 

80 

LRK ((/) = 

max) 

15.64 

15.64 

15.83 

15.87 

17.82 

85 

85 

84 

84 

80 

LRK ((/) = 

max/2) 

15.52 

15.56 

15.24 

16.14 

17.82 

85 

85 

86 

84 

80 

LRK (</) = 

max/4) 

15.31 

15.32 

15.33 

15.74 

17.82 

85 

85 

85 

85 

80 

LRK (</) = 

max/8) 

15.04 

15.08 

15.13 

15.59 

17.82 

86 

86 

86 

85 

80 

TPRS 


16.38 

15.29 

15.11 

16.01 

17.82 

83 

85 

86 

84 

80 

LTA 

LRK (</) = 

est) 

10.43 

10.56 

11.11 

11.41 

12.41 

68 

67 

64 

62 

55 

LRK ((/) = 

max) 

10.48 

10.46 

10.53 

10.84 

12.41 

68 

68 

67 

65 

55 

LRK {<j) = 

max/2) 

10.40 

10.39 

10.08 

10.72 

12.41 

68 

68 

70 

66 

55 

LRK {(f = 

max/4) 

10.30 

10.31 

10.33 

11.04 

12.41 

69 

69 

69 

64 

55 

LRK ((/) = 

max/8) 

10.26 

10.28 

10.36 

10.85 

12.41 

69 

69 

68 

65 

55 

TPRS 


10.83 

9.99 

9.88 

10.60 

12.41 

65 

71 

71 

67 

55 


the Fall and Spring seasons. Once again, there appears to be little differ¬ 
ence in model performance as the choice of range parameters varies. TPRS 
models continue to compete strongly with LRK models. The rank of the /3- 
field smooth does not tend to influence performance heavily, although again 
spatial smoothing at any rank does tend to improve predictive performance. 

Table 4 shows the results of the cross-validation study at “home outdoor” 
locations. Here, the choice of range parameter model appears to have even 
less of important role locations than it did at “fixed sites.” Namely, cross- 
validated RMSE increases only slightly, resulting in a minimal decrease in 
as the rank decreases in the raw home predictions. Detrended did 
show some decay as the rank decreased, but still remained relatively high. 
TPRS models performed as well as LRK models across ranks. Again, models 
with some spatial smoothing outperformed those models with no smoothing. 

Figure 6 compares the cross-validated R? for a set of models of rank 
K = 287,100,50 and 25 with and without the nugget present in the /3-fields 
at AQS/“fixed sites,” “community snapshot” and “home outdoor” locations. 
Note, for LRK results, the range parameter has been estimated from the 
data. The figure suggests that while full rank models {K = 287) are com¬ 
parable across these two specifications, predictive performance of models 
without the nugget in the /3-fields tend to drop off rapidly as the rank of the 
smooth decreases, particularly in the case of LRK, where in select cases the 
R? decreases to zero when K = 25. TPRS models tend to be more robust, 
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Table 3 

Cross-validated RMSE and for “community snapshot” locations when nugget is 
present. E? have been multiplied by 100 for presentation 


RMSE 


Basis / K 

287 

100 

50 

25 

0 

287 

100 

50 

25 

0 

Summer 

LRK (0 = est) 

6.74 

6.71 

6.73 

6.98 

6.97 

66 

66 

66 

63 

63 

LRK {(j) = max) 

6.68 

6.67 

7.03 

6.70 

6.97 

66 

66 

63 

66 

63 

LRK (0 = max/2) 

6.69 

6.68 

6.66 

6.96 

6.97 

66 

66 

66 

63 

63 

LRK (/) = max/4) 

6.71 

6.72 

6.66 

6.76 

6.97 

66 

66 

66 

65 

63 

LRK {(j> = max/8) 

6.76 

6.74 

6.83 

6.82 

6.97 

65 

66 

65 

65 

63 

TPRS 

6.62 

6.71 

6.70 

6.72 

6.97 

67 

66 

66 

66 

63 

Fall 

LRK {(j> = est) 

11.64 

11.61 

11.99 

11.55 

11.78 

75 

76 

74 

76 

75 

LRK {(j) = max) 

11.66 

11.59 

11.75 

11.83 

11.78 

75 

76 

75 

75 

75 

LRK {cf = max/2) 

11.66 

11.64 

11.61 

11.97 

11.78 

75 

75 

76 

74 

75 

LRK {(f) = max/4) 

11.65 

11.63 

11.53 

11.35 

11.78 

75 

75 

76 

77 

75 

LRK {(f) = max/8) 

11.66 

11.89 

11.95 

11.86 

11.78 

75 

74 

74 

74 

75 

TPRS 

11.92 

12.10 

12.14 

12.11 

11.78 

74 

73 

73 

73 

75 

Winter 

LRK {(f) = est) 

13.01 

12.92 

12.98 

12.99 

15.32 

77 

77 

77 

77 

68 

LRK {(f) = max) 

13.04 

12.94 

13.11 

14.00 

15.32 

77 

77 

76 

73 

68 

LRK {(j) = max/2) 

13.03 

12.99 

12.59 

13.63 

15.32 

77 

77 

78 

75 

68 

LRK {(f) = max/4) 

13.02 

12.98 

12.63 

13.8 

15.32 

77 

77 

78 

74 

68 

LRK {(f) = max/8) 

13.05 

13.51 

12.65 

13.95 

15.32 

77 

75 

78 

73 

68 

TPRS 

13.27 

13.04 

13.08 

14.19 

15.32 

76 

77 

77 

72 

68 


although the decrease in in TPRS models without a nugget tends to be 
greater than in TPRS models with a nugget. 

Figure 7 compares the results of fitting full and LRK models to the MESA 
Air data when the knots were chosen using space-filling of either monitoring 
locations or a regularly spaced grid of locations. Generally speaking, models 
where knots were chosen at monitoring sites performed better than those 
where knots were chosen at grid locations. 

6.2. Performance of other reduced-rank spatio-temporal modeling meth¬ 
ods. We found that the spDynLM implementation did not work well for 
our data, possibly due to the large imbalance across space and time. In 
both models {K = 50,287), the time-varying range parameter was not well 
identified and varied significantly, thus resulting in poor characterization 
of the rate of spatial decay. The temporal sparsity of the data may also 
have contributed to the poor performance due to the dynamic nature of 
the model’s temporal trend. While the in-sample fits for these models are 
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Table 4 


Cross-validated RMSE and E? for “home outdoor” locations when nugget is present. B? 


have been multiplied by 100 for presentation 


Basis / K 



RMSE 








287 

100 

50 

25 

0 

287 

100 

50 

25 

0 

Raw 











LRK ((/) = est) 

5.51 

5.52 

5.88 

5.89 

7.03 

93 

93 

92 

92 

88 

LRK ((/) = max) 

5.54 

5.54 

5.71 

6.41 

7.03 

93 

93 

92 

90 

88 

LRK ((/) = max/2) 

5.53 

5.54 

5.28 

6.01 

7.03 

93 

93 

93 

92 

88 

LRK {(j) = max/4) 

5.53 

5.58 

5.52 

6.14 

7.03 

93 

93 

93 

91 

88 

LRK ((/) = max/8) 

5.52 

5.49 

5.48 

6.26 

7.03 

93 

93 

93 

91 

88 

TPRS 

5.52 

5.50 

5.53 

6.00 

7.03 

93 

93 

93 

92 

88 

Detrended 











LRK ((/) = est) 






84 

84 

82 

82 

75 

LRK {(j) = max) 






84 

84 

83 

79 

75 

LRK ((/) = max/2) 






84 

84 

86 

81 

75 

LRK ((/) = max/4) 






84 

84 

84 

81 

75 

LRK ((/) = max/8) 






84 

84 

85 

80 

75 

TPRS 






84 

84 

84 

81 

75 


quite good, the out-of-sample predictions are highly variable, resulting in 
cross-validated equal to zero for all scenarios considered. In Figure 8 we 
show the scatter plots of observed and predicted values in fitted models. 
The histograms in this same figure represent the distribution of predictions 
at unobserved times and locations. We note that these are on the log-scale, 
so that when exponentiated to the native scale, many predicted values at 
unobserved times/locations are extremely large. 

The results of our gamm implementation were only marginally better. 
While the in-sample fits of this approach were more promising (see Fig¬ 
ure 9), the predictions were nowhere near the caliber of those achieved using 
our model. Inspection of the residuals suggests that there remains signifi¬ 
cant temporal correlation that is unaccounted for by the mean model. We 
found that the cross-validated was equal to 0 on both the two-week and 
long-term average scale using this approach. This low R^ was driven by the 
presence of outlying predictions for a handful of sites in two different cross- 
validation groups. Additionally, the model failed to converge for a single 
cross-validation group. 

7. Discussion. This paper focuses on presentation of LRK and TPRS 
representations of the mean process in the spatio-temporal model proposed 
by Szpiro et al. (2010), Sampson et al. (2011), and Lindstrom et al. (2013). 
Our approach allows for a reduced-rank representation of the /3-fields in 
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LRK at Community Snapshot Locations 


TPRS at Community Snapshot Locations 


Summ«r 


Fall 


s 


0 , 8 - 

0 , 6 - 

0,4- 

0 , 2 - 

0 , 0 - 





Fig. 6. Comparison of cross-validated at “fixed site,” “community snapshot,” and 
“home outdoor” locations using low-rank kriging and TPRS. 


the mean process of the original model, which tends to be the most time- 
consuming piece to evaluation in likelihood optimization. In certain cases, we 
have shown that such reduced-rank representations of the /3-fields can lead 
to a computational advantage over the full rank specification. Namely, when 
the nugget of the /3-field is not present, we have shown that our low-rank 
approach leads to slower growth in the CPU time required for likelihood 
evaluation. 

The formulation of the /3-fields in the mean process of the model as spatial 
splines is attractive for a number of other reasons. For example, oftentimes 
predictions of air pollution concentrations are used as inputs into health 
models to estimate health effects. Typically, the predictions are based on 
spatially misaligned data and ignoring this fact can lead to biased results 
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Fixed Site 
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LTA 


Community Snapshot 

Winter Summer Fali 


Home Outdoor 

Raw Detrended 
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I 


Fig. 7. Differences between cross-validated in LRK models with knots chosen at moni¬ 
toring locations and on a regular grid by rank. Models assume that the nugget, P, is present 
in all l3-fields and the range parameters are estimated by maximum likelihood. 
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Fig. 8. (Top row) In-sample fits for full-rank models fit in spBayes. (Bottom row) In- 
-sample fits for reduced-rank models (K — 50) fit in spBayes. Histograms represent the 
distribution of posterior predictions at time points/locations without observed data. 
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Two-Week 



T-1-1-1-1” 

2.5 3.0 3.5 4.0 4.5 


Observed 


LTA 



o 

o 

“I-1-1-r 

3.0 3.5 4.0 4.5 

Observed 


Fig. 9. In-sample fits for reduced-rank models fit in mgcv with K — 50 on the two-week 
scale (left) and long-term average scale (right). 


and overly optimistic standard errors [Szpiro, Sheppard and Lumley (2011)]. 
The expression of the /3-helds as splines places GIS covariates and spatial 
smoothing on more equal footing. Namely, in this form we can think of the 
GIS covariates and the spatial basis functions as unpenalized and penalized 
spatial covariates, respectively. This interpretation leads to a more coher¬ 
ent approach to measurement error correction for spatially misaligned data 
[Szpiro and Paciorek (2013)]. It is also important to note that the computa¬ 
tional advantage gained in log-likelihood evaluation extends analogously to 
prediction, thus reducing computation time needed to predict at potentially 
many new locations. 

For LRK models, we explored the choice of range parameters of prediction, 
ranging from the case where the range was fully estimated from the data to 
the case where it was fixed at an arbitrary conservative value indicated by 
the data. In the scenario when the range parameter is fixed, we showed that 
the original specihcation of the full-rank model can also be interpreted as a 
standard penalized spatial spline. 

Likewise, we discussed the parallels between kriging and TPRS. We em¬ 
phasize that a limitation of the kriging basis functions is the reliance on 
the range parameter and that TPRS is not subject to the same limitation. 
That being said, we note that there is an equivalence between thin plate 
splines and kriging using a Matern-covariance with infinite range [Wahba 
(1981), Nychka (2000), Kimeldorf and Wahba (1970)]. As such, one might 
view the use of TPRS as making an implicit assumption about the range 
parameter. The fact that TPRS and LRK were competitive in our results 
indicates that TPRS is a valid and attractive option for spatial smoothing 
in these models. To further this argument, we performed additional analyses 
(results included in the Online Supplement [Olives et al. (2014)]) comparing 
out-of-sample prediction variances and AIG as a means of model selection. 
These analyses indicated that TPRS models tended to result in more stable 
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prediction variances across rank specification when compared to LRK mod¬ 
els. However, there was little notable difference between AIC values in LRK 
and TPRS models. Rather, AIC values indicated full-rank LRK models were 
preferable to reduced-rank ones in all cases. TPRS models with K = 100 had 
the lowest AIC. 

Our approach to model assessment relies on cross-validation. As we are 
primarily interested in prediction of long-term averages, the cross-validation 
approach outlined isolates the spatial predictive capacity of the models. We 
applied our approach to ambient MESA Air and EPA NO^; data collected 
in the Los Angeles area as well as traditional road covariates and Caline 
point predictions models. We found that generally speaking, the choice of 
the range parameter in the LRK exponential spatial basis functions had 
little impact on the model performance. In fact, reducing the rank of the 
model tended to also have little impact in most cross-validation scenarios for 
ranks of moderate size {K = 50,100). We note that the recommendation of 
Ruppert, Wand and Carroll {K = 71 for the MESA Air data) falls squarely 
in this range [Ruppert, Wand and Carroll (2003)]. However, we found that 
reduction of the rank of the /3-helds below K = 50 tended to noticeably 
impact model predictions. This impact was further exacerbated by exclusion 
of the nugget in the /3-fields. This finding is not a surprise, as exclusion of 
the nugget in the /3-helds amounts to attributing all extra variation in the 
mean beyond what is explained by the CIS covariates to the spatial /3-helds. 
Reduction of the rank of the smooth of these random helds results in a 
spatial smooth that is unlikely to be able to capture spatial heterogeneity. 

This unfortunate hnding is at odds with the goal of reducing the computa¬ 
tional burden of full-rank spatio-temporal likelihood evaluations. Although 
the original specihcation published by Szpiro et al. did not include a nugget 
in the /3-held, it is our feeling that such models are less defensible than 
those that include a nugget, since it is unlikely that the CIS covariates in 
the model account for all nonsmooth spatial variation. 

That being said, the results herein described are based on a single data 
setting. Indeed, there almost surely exists other data sets where inclusion of 
a nugget in the /3-fields is contraindicated. In these cases, use of a moderate 
rank smooth could lead to both a computational and predictive advantage. 

Last, we examined a number of other approaches and specification to 
modeling NO^ concentrations in the current data set and found poor per¬ 
formance for two off-the-shelf packages. Our findings confirm that the long 
history of methodological development of the model under study in the con¬ 
text of modeling air pollution exposures for MESA Air was indeed well 
guided and that current off-the-shelf packages are not ideal for analyzing 
these data. Future research should, however, include investigations into the 
extension of the current model using covariance tapering of either the /3- 
fields covariance or even of the overall covariance matrix S. 
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The LA NO 3 ; data application is meant to exemplify the current meth¬ 
ods. However, we note that this model is being applied more broadly to 
four separate pollutants in six major cities in the United States as part of 
MESA Air [Keller et al. (2014)]. Furthermore, a rigorous approach to model 
selection, that varies the number of trends, covariates and /3-field models, is 
also being applied to choose the best performing predictive models. Taking 
into account cross-validation, this effort includes the fitting of hundreds of 
models, representing a significant investment of time on the part of MESA 
Air investigators. To further emphasize the impact of the current meth¬ 
ods, we performed a separate set of analyses replicating a large subset of 
the cross-validation scenarios for NO^ data in Los Angeles considered by 
MESA Air investigators in their development of exposure models for use in 
primary MESA Air health analyses. We found that TPRS models achieved 
highly competitive results in roughly half the time, suggesting that had these 
methods been available during model development, potentially hundreds of 
computer hours could have been saved during the model development pro¬ 
cess. As we move toward incorporating the current methods into the highly 
optimized SpatioTemporal package, and further optimize the reduced-rank 
model fitting procedures, we expect that the gains in computational time 
will increase in orders of magnitude, to roughly 5 times faster. As such, we 
believe that the current work will continue to have tangible implications 
for MESA Air investigators and their collaborators who continue to use the 
MESA Air spatio-temporal model as the basis for exposure assessment in 
air pollution cohort studies. 
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SUPPLEMENTARY MATERIAL 

Supplement to “Reduced-rank spatio-temporal modeling of air pollu¬ 
tion concentrations in the Multi-Ethnic Study of Atherosclerosis and Air 
Pollution” (DOL 10.1214/14-AOAS786SUPP; .pdf). We provide a detailed 
derivation of the optimized likelihood, comparisons of the prediction vari¬ 
ances, discussion model selection by AIC for the paper “Reduced-rank spatio- 
temporal modeling of air pollution concentrations in the Multi-Ethnic Study 
of Atherosclerosis and Air Pollution” by Casey Olives, Lianne Sheppard, Jo¬ 
han Lindstrom, Paul D. Sampson, Joel D. Kaufman and Adam A. Szpiro. 
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